POEM: 1-Bit Point-Wise Operations Based on E-M for Point Cloud Processing

163

Algorithm 12 POEM training. L is the loss function (summation of LS and LR) and N

is the number of layers. Binarize() binarizes the filters obtained using the binarization Eq.

6.36, and Update() updates the parameters according to our update scheme.

Input: a minibatch of inputs and their labels, unbinarized weights w, scale factor α,

learning rates η.

Output:

updated

unbinarized

weights

wt+1,

updated

scale

factor

αt+1.

1: {1. Computing gradients with aspect to the parameters:}

2: {1.1. Forward propagation:}

3: for i =1 to N do

4:

bwi Binarize(wi) (using Eq. 6.36)

5:

Bi-FC features calculation using Eq. 6.87 – 6.72

6:

Loss calculation using Eq. 6.88 – 6.44

7: end for

8: {1.2. Backward propagation:}

9: for i =N to 1 do

10:

{Note that the gradients are not binary.}

11:

Computing δw using Eq. 6.89 – 6.59

12:

Computing δα using Eq. 6.60 – 6.62

13:

Computing δp using Eq. 6.63 – 6.64

14: end for

15: {Accumulating the parameters gradients:}

16: for i = 1 to N do

17:

wt+1 Update(δw, η) (using Eq. 6.89)

18:

αt+1 Update(δα, η) (using Eq. 6.61)

19:

pt+1 Update(δw, η) (using Eq. 6.64)

20:

ηt+1 Update(η) according to learning rate schedule

21: end for

Then, we optimize wj

i as

δwj

i = ∂LS

∂wj

i

+ λ∂LR

∂wj

i

+ τEM(wj

i ),

(6.58)

where τ is the hyperparameter to control the proportion of the Expectation-Maximization

operator EM(wj

i ). EM(wj

i ) is defined as

EM(wj

i ) =

 2

k=1 ˆξjk

i μk

i wj

i ),

ˆμ1

i < wj

i < ˆμ2

i

0,

else

.

(6.59)

Updating αi: We further update the scale factor αi with wi fixed. δαi is defined as the

gradient of αi, and we have

δαi = ∂LS

αi

+ λ∂LR

αi

(6.60)

αi ←|αiηδαi|,

(6.61)

where η is the learning rate. The gradient derived from softmax loss can be easily calculated

on the basis of backpropagation. Based on Eq. 6.44, we have

∂LR

αi

= (wiαibwi) · bwi.

(6.62)